Dictionary Look-Up within Small Edit Distance
نویسندگان
چکیده
Let W be a dictionary consisting of n binary strings of length m each, represented as a trie. The usual d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. We present an algorithm to determine if there is a member in W within edit distance d of a given query string q of length m. The method takes time O(dm d+1) in the RAM model, independent of n, and requires O(dm) additional space.
منابع مشابه
Efficient approximate dictionary look-up over small alphabets
Given a dictionary W consisting of n binary strings of length m each, a d-query asks if there exists a string in W within Hamming distance d of a given binary query string q. The problem was posed by Minsky and Papert in 1969 [10] as a challenge to data structure design. Efficient solutions have been developed only for the special case when d = 1 (the 1-query problem). We assume the standard RA...
متن کاملCompressed String Dictionary Look-Up with Edit Distance One
In this paper we present different solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern P , the index has to report all the strings in the dictionary having edit distance at most one with P . Our first solution is able to solve queries in (almost optimal) O(|P |+ occ) time where occ is the number of strings in the dictionary having edit distance at ...
متن کاملLexical Access via Phoneme to Grapheme Conversion
The Lexical Access (LA) problem in Computer Science aims to match a phoneme sequence produced by the user to a correctly spelled word in a lexicon, with minimal human intervention and in a short amount of time. Lexical Access is useful in the case where the user knows the spoken form of a word but cannot guess its written form or where the users best guess is inappropriate for look-up in a stan...
متن کاملPhrase-Based Statistical Machine Translation Using Approximate Matching
Phrase-based statistical models constitute one of the most competitive pattern-recognition approaches to machine translation. In this case, the source sentence is fragmented into phrases, then, each phrase is translated by using a stochastic dictionary. One shortcoming of this phrase-based model is that it does not have an adequate generalization capability. If a sequence of words has not been ...
متن کاملAlgorithme de recherche approximative dans un dictionnaire fondé sur une distance d'édition définie par blocs
We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings— broadly belonging to the family of edit distances; it finds dictionary entries whose distance to the search string is below a certain threshold. The divergence function is not the classical edit distance (DL dis...
متن کامل